Binary Packet
Who is this article for? This article is written for people with an understanding of networking and an intermediate understanding of the language they plan to use along with a strong understanding of bitwise operations. It's intended to demonstrate a way to form packets in a compact way working on the bit level rather than the byte or string level. In other words, this is for people that need to send a lot of information, like in a game, using as little bandwidth as possible. This article will mainly deal with TCP where the socket is a stream of data. Understanding that TCP is a reliable protocol that ensures data is delivered in order is necessary for understanding this article. Also this article uses examples dealing with a server-client model, but the ideas can easily be applied to other models. What is a binary packet? Simply put a binary packet is a contiguous array of bytes representing a data format used to serialize data. What does this mean exactly? Well it's a way to pack data tightly together. Lets say you wanted to send a player's position represented by the mathematical vector of 2 floating point values, (10.5, 20.6). Each element in the vector is a single float of 4 bytes. So we can store two values in 8 bytes. (We'll cover other methods later). Moving away from the byte level writing a boolean would use 1 bit. Implementation wise the binary packet is just a binary writer and reader rolled into one class. The term class is used because object-oriented programming is most commonly used to implement this. In the end an interface is made giving the user access to write and read methods of various data types. Packet format, use cases and considerations Before going into the implementations the reason for a binary packet needs to be explained. At its core the data being sent over a socket is meaningless. Much like the data in RAM it only has the meaning you give to it, so 32 bits can be seen as an integer, a float, or even 32 boolean values. So in order to use a binary packet you have to decide on a format for your messages. Other formats exist. An XML format or a string format with delimiters can also be used, but are often verbose. A format will be proposed and dissected running through common use cases. The format will be of the form: | | | x N | |} This format has a header (the prefix of the packet) representing the length. In our case the length won't be the byte length. It will actually be the number of bits. You may be wondering why a length is needed. UDP doesn't have this step since packets are a fixed size. The idea in TCP is to treat each set of messages as a transaction — this article refers to them as packets (TCP really doesn't have a concept of a packet on the interface level). Doing this you can bundle things like state updates safely together without the fear that they might be evaluated on different time-steps. As an example, imagine sending 200 state updates and the packet is fragmented. You don't want to handle half the state updates in one time step and the other half in the next one. Any physics or gameplay interaction could be desynchronized by such an event. That's why the idea of a packet is used. Following the header there is a body. The body is a sequence of event identifiers (event ID) and data corresponding to that event ID. But what is an event ID? It's simply an enumeration (names mapped to a constant integer). In a server client model you have ServerEvents and ClientEvents that hold these values. ServerEvents contains events the server sends to the client. These include things like ping, the login result, and state updates. The ClientEvents contains events the client sends to the server such as pong, login, and input data. namespace SocketEvents { public class ServerEvents { public const uint None = 0; // None public const uint Ping = 1; // Ping for measuring RTT public const uint LoginResult = 2; // Was the login accepted? public const uint StateUpdates = 3; // Game state updates } public class ClientEvents { public const uint None = 0; // None public const uint Pong = 1; // Pong for measuring RTT, must be sent after ServerEvents.Ping public const uint Login = 2; // Login information public const uint Input = 3; // Client input } } The data section is optional. For example, ping doesn't have any data section. The data section is just extra information for that event ID. The key is that it's in a format. When you are processing a packet you will be reading event IDs and then handling the data section sequentially. If this isn't clear then the following example will help you to understand. Imagine the client sends a packet of the format: The first message when processed will read in ClientEvents.Login and it would immediately expect a String_Username and a String_Password. An error would occur if all that was in the packet was: In that corrupt packet when the program tried to read the username string it would try to interpret the 4 byte integer as a string. If it failed to do that then you'd have a corrupt packet which when coming from the client is extremely suspicious. Moving on, you may be wondering how a string is written and read. Just like in allocating a string in memory two common methods arise. Either a terminating character such as a null is appended to the end or the length of the string is prepended before the characters. This will be covered in the section about strings. To illustrate the bit part of this writer a boolean was quickly mentioned. The true and false values can be represented by a single bit. Computers pad the bits so that they can be stored in RAM at convenient aligned byte positions. A binary packet doesn't have to do this and can align data to the bit level. For example, say you had the movement input up, down, left, right to serialize. Each state is either true or false representing a key state. So a packet for that input might look like: It costs only 4 bits for the 4 states. You may be familiar with this concept as bit flags or bit vectors. Boolean isn't the only data type that benefits from this packing strategy. Imagine you have a variable on your server that doesn't change much, such as the max health of a character. Lets say this number is sent to the client or known to be 100. Any value below 100 can be represented with 7 bits. So when sending state changes to health the cost is only 7 bits for the health value. The section on N-Bit integers covers this in more detail. What if you have a value for an integer that is normally 100, but it might be much larger once in a great while. The concept of a variable-width encoded integer is covered in the section Dynamic Integer. The idea being you can use less bits to encode smaller numbers and more bits to encode larger numbers. A solid example of this is UTF8. Serializing integers is fine, but once in a while you'll need to send floating point data. Sending 32 bit floats or 64 bit doubles tends to be costly when you need to send a lot of them. Luckily for certain applications you might not need full precision or you might know the minimum and maximum value. Custom resolution floats solves this problem and is covered in the section Custom Resolution Float. Topics Underlying Data Structure The concept behind a binary bit packet is simply a long bit vector (sequence of bits). This is probably the most complicated part to get right if you have never used bitwise operators. The class consists of a buffer object which is just a dynamic array of bytes along with an integer bitIndex that holds the current bit position for writing and reading. Say we need to serialize a bit with the value one into the buffer at the bit index 0. The first step is to expand the buffer if it's too small. while ((bitIndex + 1 + 7) / 8 > buffer.Count) { buffer.Add(new byte()); } Go through a few different bitIndex values to understand how the algorithm works. The 1 in the equation just represents the number of bits, so a byte would have an 8 and not a 1. For a single bool the while could be an if since the largest the buffer can increase is 1 byte when writing a bit. The second step is to copy the bit into the buffer at the bit index. buffer/ 8 |= (byte)(1 << (7 - bitIndex % 8)); In this example you are taking 1 (in binary it would look like 00...001) and shifting all the bits left 7 - bitIndex modulus 8 (remember modulus has a higher precedence than subtraction normally). So if the bitIndex was one you'd be shifting over 6 times and end up with a value on the right side of 64 or 0b01000000. This value is or'ed with the buffer byte setting the second bit in that buffer's byte. This example only covered writing a single bit. To handle larger sequences of bits study the code appended to this article. It's a good idea to attempt doing this part on your own for practice first. Try to write a byte to the buffer given any bit index. Strings Prepending the length and then characters is a good way to handle strings. Strings are probably one of the easiest to understand concepts. Just need to prepend a length then list a bunch of ASCII characters, right? Wrong. In this day and age unicode is used, so supporting it should be on your list of priorities. On the plus side UTF8 already uses a variable-width encoding. If you aren't familiar with this concept basically ASCII characters cost 1 byte and other characters cost between 2 and 4 bytes. Most characters can be represented in 2 bytes. The only way to support unicode in your binary writer/reader is to understand it and implement functions for handling it. Some libraries like .NET have support for it that makes using it trivial, but if you're not using a .NET language writing an encoder and decoder is still pretty easy. (Or use a library for unicode support). In any case make sure you read and understand this article covering UTF8. This idea of unicode doesn't limit the use of clever tricks. Even though UTF-8 is variable length you can still using tricks to cut the size down especially for ASCII. Huffman coding is the perfect example. N-Bit Integer Sometimes you don't need to use exactly 8, 16, or 32 bits. Maybe you want to use 24? This is where two's complement comes into play. For unsigned values you need to find the maximum value you want to store and calculate the bits required. Say you needed to store the values 0 through 3 then you can easily use 2 bits. The interface for this function will probably be set up with an unsigned integer value and a bits parameter where the user can specify the number of bits to store the value in. Again study the code appended to this article. Dynamic Integer After understanding UTF8 this method will seem like a common sense optimization. Let's say you know the max number you'll be serializing is less than 31 most of the time, but you want to allow for values that are larger in case they happen. Well integers between 0 and 31 can fit in 5 bits. So in order to accomadate for larger numbers we can write a bool to say if the number was contained in 5 bits. False can mean it was contained in 5 bits and true means it wasn't. We'll call these continuation bits. You prepend these to the sequence of bits that contains the value. What if the bit value is true? Obviously with 0 you would stop and you knew the value fit in 5 bits. When the continuation bit is 1 you'll need another 5 bits and another continuation bit. So to make sure you're following the value 32 for an unsigned integer with 5 bit intervals is encoded as: 0b10 00001 00000 You'll notice the least significant bit is on the right. For each extra sequence of bits another continuation bit is used until the sequence is greater than or equal to 32. In the case where the sequence is 32 bits an extra 0 at the end is unneeded since it's known that the sequence has reached the max bits. To illustrate this look at the table below with unsigned examples: This concept is very nice to understand and can be used to save small amounts of bandwidth or large in the case of UTF8 compared to UTF32. Custom Resolution Float To understand a custom resolution float let's start with an example of a long range radar in a game. It might have position data for many different entities. The main concept though is that these positions don't need full precision. Imagine a 100x100 pixel map and you have 10 types of entities and 100 entities within the radar. Each entity is represented by their floating point position with an x and y coordinate. For the radar example we map entities relative to the player using: Vector2 position = (entity.Position - player.Position) * scalar + Vector2(50.0, 50.0); Scalar is the ratio of the real map to the radar in order to scale the positions. Now if all the entities are to be drawn within the radar we can offset them into the 0.0 to 100.0 range as shown. How many bits does it take to serialize the numbers 0.0 to 100.0? We talked about integers, but didn't cover floats. We can actually choose any arbitary number of bits. Lets say we use 5 bits. In 5 bits we can represent 0 to 31. Our floating point range can be mapped to these 32 values using: newValue = (value - min) / (max - min) * (2^bits - 1) or in our example: newPositionX = (position.X - 0.0) / (100 - 0) * 31 This obviously restricts the possible positions in the radar. Entities next to one another might appear on top of one another in the map. Using more bits fixes this problem. At any rate even if you choose 7 bits you're still not anywhere near 32 bits. To find the precision of using N-Bits you simply need to do: precision = (max - min) / (2^bits - 1) In our example we get: 3.22 = (100 - 0) / 31 This means that when plotting a player they will be within 3 pixels of their correct spot on the radar. So how would this look in a packet: | | | x EntityCount | |} Using the information: The bit resolution of 6 bits gives us a precision of 1.6 pixels. The equation to find how many bits that message costs is given by: 32 + 7 + 8 + (4 + 6 + 6) * 100 entities = 1647 bits = 206 bytes This byte count is small enough to be included in with other updates. If it was over 1400 bytes it might not be as friendly on the network. There is a special case that was left out. Imagine using the range -100 to 100. The value zero cannot be represented perfectly no matter what bit resolution is used. But what if that range was for velocity and you need to be able to represent zero. This problem is much simpler to solve than it sounds. When the minimum is less than zero and the maximum is greater than zero and the value is zero then the value stored in the bits is zero. If the value is not zero then the value is stored in the range 1 to (2^bits - 2). So for 6 bits we can store 0 to 62 and then we offset that by one so that the range is 1 to 63 and as mentioned zero is reserved for the value zero. Just so this is clear when the minimum and maximum are greater than or equal to zero the other method is used with no special case for zero. The code at the bottom has an example of this method. Things to keep in mind Often people serialize things they don't need to. Maybe it's because it's easier than deciding if the client already knows about them or easier than keeping track of things. As an example if you're sending every player's health, mana and stats every turn you have much larger problems that bit packing just can't fix. Keep track of everything the client knows about. You should never be telling the client things they already know. If an entity walks into a player's range do a full state update, then on the following updates simply do delta updates. This strategy will greatly reduce your bandwidth. As a quick lesson, notice in the section on the custom resolution float how it builds information for every entity's position even if the client might already know about the entity's position in the radar and nothing is moving. If you knew a lot of players or entities are going to be idle like in a social game you could put an entity ID in and only update entities on the radar that are moving. Also another strategy is to only update the radar every second rather than every frame and stop updating it for a client that is idle for over a minute. Tons of strategies like this will greatly cut down on bandwidth. Remember to actually calculate estimates for everything. Don't just guess that something is costly or you might end up missing some simple optimization that can make it cost-effective. The idea of compressing these packets comes up from time to time. Binary packets tend to have very high entropy. Meaning they are very random making it hard to compress most of the data contained in them. Also their size tends to be small which further degrades any chance of compressing them effectively. In short don't worry about compression. Target larger problems. Conclusion Binary packets that work on the bit level offer many simple advantages that are nearly transparent to the programmer as they format packets. The methods listed above are some of the most useful to have. Some other tricks or ideas were left out. For instance, this article never mentions methods to seamlessly write mathematical vectors and other basic classes. Some people like to add those in because it saves them time writing the same code for serializing vectors. Other things such as support for arrays and associative arrays were left out for a reason. Writing an array is normally a unique thing. Writing boxed functions to do it might not always be optimal. As a note though putting the length then the sequence of elements is normally the preferred approach. Code For convenience the code has been placed on its own page found at the link: Binary Packet Code.